There is a class of simulation called "cellular automata" which use a set of discrete cells, with rules governing how the values in those cells change over time. The Wikipedia article shows a number of examples of places that cellular automata are used in research.
The name comes from "cellular" meaning "discrete boxes holding values" and "automaton" meaning "something which moves by itself".
We are going to create a very basic simulation which models the spread of a disease across a population of cells. In order to keep this as simple and understandable as possible, we will be ignoring things like cells moving around, contact time, resistance, etc.
Let's start by modelling our world as a grid. In fact, to keep it even simpler, let's keep it as a one-dimensional array. Each cell in the array is either healthy ( ), or is infected ( ).
Since we're working with code, we need to represent our array numerically where 0
is a healthy cell and 1
is an infected cell:
0 |
1 |
0 |
0 |
1 |
import numpy as np
first_array = np.array([0, 1, 0, 0, 1])
As in the previous chapter, this array is our "state". Instead of a single value per time step, we have an array of values. If we define some rules then we will be able to run the simulation.
The simplest possible rule I can think of for the spread of infection is (and remember we're not aiming for realism here) that if a healthy cell has an infected cell to its left then it becomes infected. I.e. the next state of the system would be:
0 |
1 |
1 |
0 |
1 |
Before we dive in with a full simulation, let's take a small step at a time. Instead of trying to implement the rule and update the state, let's just print some variables and see what's happening.
import numpy as np
first_array = np.array([0, 1, 0, 0, 1])
for i in range(first_array.shape[0]):
print(f"Looking at cell {i}")
index_of_cell_to_left = i - 1
print(f" The cell to the left is at index {index_of_cell_to_left}")
cell_to_left = first_array[index_of_cell_to_left]
print(f" The value of the cell to the left is {cell_to_left}")
python cells.py
Immediately here we can see something that maybe you predicted. What is the cell to the left of the 0th cell? There isn't one!
This brings us to one of the thing that you have to make a decision about when creating a cellular automaton: what do we do at the edges? It's perfectly valid to say that the array or grid should wrap around (this is called toroidal geometry), but it's also ok to say that it stops at each end and doesn't wrap. This decision may affect the outcome of the simulation and so either way a deliberate decision should be made.
If we subtract 1
from index 0
then we get index -1
which, while valid Python and numpy, is at the far end of the array (in our case it's equivalent to index 4
) so at the lower end the wrapping is happening automatically. In our case, I want to stop any wrapping from happening so let's add in a specific if
/else
to check the value of the index and set a default value of 0
in that case:
import numpy as np
first_array = np.array([0, 1, 0, 0, 1])
for i in range(first_array.shape[0]):
print(f"Looking at cell {i}")
index_of_cell_to_left = i - 1
print(f" The cell to the left is at index {index_of_cell_to_left}")
if index_of_cell_to_left < 0:
cell_to_left = 0
print(f" Out of bounds, use the default value of {cell_to_left}")
else:
cell_to_left = first_array[index_of_cell_to_left]
print(f" The value of the cell to the left is {cell_to_left}")
python cells.py
Then, depending on the value of cell_to_left
, we can decide what the new value of our cell should be. If cell_to_left
is infected then we become infected, otherwise, we remain the same.
To hold the result we make a new array (with np.zeros_like
) and fill it in as we loop over the old one:
import numpy as np
first_array = np.array([0, 1, 0, 0, 1])
new_array = np.zeros_like(first_array)
for i in range(first_array.shape[0]):
index_of_cell_to_left = i - 1
if index_of_cell_to_left < 0:
cell_to_left = 0
else:
cell_to_left = first_array[index_of_cell_to_left]
if cell_to_left == 1:
new_array[i] = 1
else:
new_array[i] = first_array[i]
print("Old array is:", first_array)
print("New array is:", new_array)
python cells.py
We can see here that in the old array, cells 0, 1, 3 and 4 have a (real or virtual) healthy cell to their left and so they remain unchanged. Cell 2 has an infected cell to its left and so it becomes infected.
It's important that the decision is made only on the previous state (i.e. the values in first_array
) and not on any newly-computed values in new_array
.
The simulation progresses by running that same algorithm over the new array, and so on. Let's put all that into a function update_state
similar to in the last chapter (only changing the variable name first_array
to be previous_array
):
def update_state(previous_array):
new_array = np.zeros_like(previous_array)
for i in range(previous_array.shape[0]):
index_of_cell_to_left = i - 1
if index_of_cell_to_left < 0:
cell_to_left = 0
else:
cell_to_left = previous_array[index_of_cell_to_left]
if cell_to_left == 1:
new_array[i] = 1
else:
new_array[i] = previous_array[i]
return new_array
Similarly we can create a new initialise_state
function to set up the starting conditions of the system. It makes a 2-dimensional array with "number of steps" rows and "size of state array" columns to hold the full history of the system:
def initialise_state(num_steps, array_size, initial_state):
cells = np.zeros((num_steps, array_size), dtype="uint8") # dtype for integers from 0-255
cells[0] = initial_state
return cells
Now we can use our run_simulation
function from the last chapter, changing only that it now takes an additional array_size
argument and passes that through to the initialise_state
function:
def run_simulation(num_steps, array_size, initial_state):
state = initialise_state(num_steps, array_size, initial_state)
for t in range(1, num_steps):
state[t] = update_state(state[t-1])
return state
Putting all of that together
import numpy as np
def initialise_state(num_steps, array_size, initial_state):
"""
Initialise the initial state of the system
The value ``array_size`` should match the size of ``initial_state``
Args:
num_steps (int): how many steps the simulation will run for
array_size (int): the size of each iteration's state
initial_state (list, np.ndarray): the 1-dimensional initial state of the system
Returns:
np.ndarray: the 2-dimensional container for state history with the initial state present
"""
cells = np.zeros((num_steps, array_size), dtype="uint8") # this dtype holds integers from 0-255
cells[0] = initial_state
return cells
def update_state(previous_array):
"""
Args:
previous_array (np.ndarray): the 1-dimensional previous state of the system
Returns:
np.ndarray: the 1-dimensional new state of the system
"""
new_array = np.zeros_like(previous_array)
for i in range(previous_array.shape[0]):
index_of_cell_to_left = i - 1
if index_of_cell_to_left < 0:
cell_to_left = 0
else:
cell_to_left = previous_array[index_of_cell_to_left]
if cell_to_left == 1:
new_array[i] = 1
else:
new_array[i] = previous_array[i]
return new_array
def run_simulation(num_steps, array_size, initial_state):
state = initialise_state(num_steps, array_size, initial_state)
for t in range(1, num_steps):
state[t] = update_state(state[t-1])
return state
number_of_steps = 5
initial_state = [0, 1, 0, 0, 1]
array_size = len(initial_state)
cells = run_simulation(number_of_steps, len(initial_state), initial_state)
print(cells)
np.savez_compressed("infection_simulation", state=cells)
python cells.py
Here the array cells
contains the value of the array at every step in the simulation. Each row is one time step.
Run a simulation with an array length of 20 and run it for 20 time steps. Set the initial array to any combination of 0
s and 1
s that you like.
Bonus: try setting the initial state randomly.
Printing out the values of the overall state is one way of visualising the progression of the simulation, but as they get more complex, you'll want to simplify the output down to the important values. In our case, perhaps we want to keep track of the number of infected cells as time progresses.
All the information we need is in our cells
variable so we can look through it and extract the data we want, even after the simulation has finished.
Let's create a Jupyter Notebook (or you could do this in another Python script if you like) and load in our data from the simulation:
import numpy as np
with np.load("infection_simulation.npz") as f:
cells = f["state"]
To count the values in a numpy array, you can use np.count_nonzero
which takes a "boolean array" (these are the same idea as our "filters" from Pandas) and counts the number of True
s:
np.count_nonzero(cells[0] == 1)
If we loop over cells
by row we can extract this count for every time step:
number_infected = []
for time_step_state in cells:
infected = np.count_nonzero(time_step_state == 1)
number_infected.append(infected)
print(number_infected)
We can then put this into a Pandas DataFrame
and analyse or plot it any way we wish:
import pandas as pd
summary = pd.DataFrame({"number_infected": number_infected})
summary.index.name = "Time step"
summary
import seaborn as sns
sns.set_theme()
g = sns.relplot(data=summary, kind="line").set(
xlim=(0,None),
ylim=(0,None),
)
If you want to save Pandas DataFrames
to files, there are number of options. The simplest for now is the HDF5 format:
summary.to_hdf("infection_summary.h5", "summary")
new_summary = pd.read_hdf("infection_summary.h5", "summary")
new_summary
But you could also use DataFrame.to_csv
and pd.read_csv
if you like.
It's quite common to have one Python script which runs the actual simulation, for example on a powerful HPC system, writing out an .npz
file. Then another which processes the output on that same large computer to create the summary information as a .h5
file. Once the data is down to a manageable size, you can then download it to your laptop to produce graphs etc.
Plot the number of infected cells over time for the data from the previous exercise (20 cells for 20 time steps)